Andrews County
The Race to Build the DeepSeek of Europe Is On
As Europe's longstanding alliance with the US falters, its push to become a self-sufficient AI superpower has become more urgent. As the relationship between the US and its European allies shows signs of strain, AI labs across the continent are searching for inventive ways to close the gap with American rivals that have so far dominated the field. With rare exceptions, US-based firms outstrip European competitors across the AI production line--from processor design and manufacturing, to datacenter capacity, to model and application development. Likewise, the US has captured a massive proportion of the money pouring into AI, reflected in the performance last year of its homegrown stocks and the growth of its econonmy . The belief in some quarters is that the US-based leaders --Nvidia, Google, Meta, OpenAI, Anthropic, and the like--are already so entrenched as to make it impossible for European nations to break their dependency on American AI, mirroring the pattern in cloud services.
- Information Technology (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy (0.96)
- Government > Military (0.69)
A Multimodal Conversational Agent for Tabular Data Analysis
Awad, Mohammad Nour Al, Ivanov, Sergey, Tikhonova, Olga, Khodnenko, Ivan
Abstract--Large language models (LLMs) can reshape information processing by handling data analysis, visualization, and interpretation in an interactive, context-aware dialogue with users, including voice interaction, while maintaining high performance. The system lets users query datasets with voice or text instructions and receive answers as plots, tables, statistics, or spoken explanations. Built on LLMs, the suggested design combines OpenAI Whisper automatic speech recognition (ASR) system, Qwen-coder code generation LLM/model, custom sandboxed execution tools, and Coqui library for text-to-speech (TTS) within an agentic orchestration loop. Unlike text-only analysis tools, it adapts responses across modalities and supports multi-turn dialogues grounded in dataset context. In an evaluation of 48 tasks on three datasets, our prototype achieved 95.8% accuracy with model-only generation time under 1.7 seconds (excluding ASR and execution time). A comparison across five LLM sizes (1.5B-32B) revealed accuracy-latency-cost trade-offs, with a 7B model providing the best balance for interactive use. By routing between conversation with user and code execution, constrained to a transparent sandbox, with simultaneously grounding prompts in schema-level context, the T alk2Data agent reliably retrieves actionable insights from tables while making computations verifiable. In the article, except for the T alk2Data agent itself, we discuss implications for human-data interaction, trust in LLM-driven analytics, and future extensions toward large-scale multimodal assistants. Interacting with data often requires programming skills or statistical expertise, creating barriers for managers, analysts, and other non-technical users [1], [2]. Natural language interfaces (NLIs) aim to improve this information seeking process by allowing users to query data conversationally [3], [4]. At the same time, voice interfaces are becoming increasingly common in daily life, yet existing voice assistants remain limited: they can answer factual questions or control devices, but they lack the analytical capabilities needed for meaningful data exploration.
- Asia > Russia (0.14)
- Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.05)
- North America > United States > Texas > Andrews County (0.04)
- North America > United States > Texas > Andrews County (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > Canada > British Columbia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Data Science > Data Mining (0.67)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas > Andrews County (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (3 more...)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > Texas > Andrews County (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
A Survey of Dimension Estimation Methods
Binnie, James A. D., Dłotko, Paweł, Harvey, John, Malinowski, Jakub, Yim, Ka Man
It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the data, hence the complexity of the dataset at hand. A great variety of dimension estimators have been developed to find the intrinsic dimension of the data but there is little guidance on how to reliably use these estimators. This survey reviews a wide range of dimension estimation methods, categorising them by the geometric information they exploit: tangential estimators which detect a local affine structure; parametric estimators which rely on dimension-dependent probability distributions; and estimators which use topological or metric invariants. The paper evaluates the performance of these methods, as well as investigating varying responses to curvature and noise. Key issues addressed include robustness to hyperparameter selection, sample size requirements, accuracy in high dimensions, precision, and performance on non-linear geometries. In identifying the best hyperparameters for benchmark datasets, overfitting is frequent, indicating that many estimators may not generalise well beyond the datasets on which they have been tested.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- (12 more...)
- Overview (1.00)
- Research Report > New Finding (0.67)
- Education (0.45)
- Government (0.45)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals
Kimura, Tomoyoshi, Li, Xinlin, Hanna, Osama, Chen, Yatong, Chen, Yizhuo, Kara, Denizhan, Wang, Tianshi, Li, Jinyang, Ouyang, Xiaomin, Liu, Shengzhong, Srivastava, Mani, Diggavi, Suhas, Abdelzaher, Tarek
Standard multimodal self-supervised learning (SSL) algorithms regard cross-modal synchronization as implicit supervisory labels during pretraining, thus posing high requirements on the scale and quality of multimodal samples. These constraints significantly limit the performance of sensing intelligence in IoT applications, as the heterogeneity and the non-interpretability of time-series signals result in abundant unimodal data but scarce high-quality multimodal pairs. This paper proposes InfoMAE, a cross-modal alignment framework that tackles the challenge of multimodal pair efficiency under the SSL setting by facilitating efficient cross-modal alignment of pretrained unimodal representations. InfoMAE achieves \textit{efficient cross-modal alignment} with \textit{limited data pairs} through a novel information theory-inspired formulation that simultaneously addresses distribution-level and instance-level alignment. Extensive experiments on two real-world IoT applications are performed to evaluate InfoMAE's pairing efficiency to bridge pretrained unimodal models into a cohesive joint multimodal model. InfoMAE enhances downstream multimodal tasks by over 60% with significantly improved multimodal pairing efficiency. It also improves unimodal task accuracy by an average of 22%.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- Oceania > Australia > New South Wales > Sydney (0.05)
- (5 more...)
- Information Technology (0.93)
- Government (0.68)
Riemannian Geometry for the classification of brain states with intracortical brain-computer interfaces
Marin-Llobet, Arnau, Manasanch, Arnau, Sanchez-Manso, Sergio, Tresserras, Lluc, Zhang, Xinhe, Hua, Yining, Zhao, Hao, Torao-Angosto, Melody, Sanchez-Vives, Maria V, Porta, Leonardo Dalla
This study investigates the application of Riemannian geometry - based methods for brain decoding using invasive electrophysiological recordings. Although previously employed in non - invasive, the utility of Riemannian geometry for invasive datasets, which ar e typically smaller and scarcer, remains less explored. Here, we propose a Minimum Distance to Mean (MDM) classifier using a Riemannian geometry approach based on covariance matrices extracted from intracortical Local Field Potential (LFP) recordings acros s various regions during different brain state dynamics. For benchmarking, we evaluated the performance of our approach against C onvolutional N eural N etworks (CNNs) and Euclidean MDM classifiers. Our results indicate that the Riemannian geometry - based classification not only achieves a superior mean F1 macro - averaged score across different channel configurations but also requires up to two orders of mag nitude less computational training time. Additionally, the geometric framework reveals distinct spatial co ntributions of brain regions across varying brain states, suggesting a state - dependent organization that traditional time series - based methods often fail to capture. Our findings align with previous studies supporting the efficacy of geometry - based methods and extending their application to invasive brain recordings, highlighting their potential for broader clinical use, such as brain computer interface applications.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Texas > Andrews County (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.93)
A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages
Joel, Sathvik, Wu, Jie JW, Fard, Fatemeh H.
Large Language Models (LLMs) have shown impressive capabilities in code generation for popular programming languages. However, their performance on Low-Resource Programming Languages (LRPLs) and Domain-Specific Languages (DSLs) remains a significant challenge, affecting millions of developers-3.5 million users in Rust alone-who cannot fully utilize LLM capabilities. LRPLs and DSLs encounter unique obstacles, including data scarcity and, for DSLs, specialized syntax that is poorly represented in general-purpose datasets. Addressing these challenges is crucial, as LRPLs and DSLs enhance development efficiency in specialized domains, such as finance and science. While several surveys discuss LLMs in software engineering, none focus specifically on the challenges and opportunities associated with LRPLs and DSLs. Our survey fills this gap by systematically reviewing the current state, methodologies, and challenges in leveraging LLMs for code generation in these languages. We filtered 111 papers from over 27,000 published studies between 2020 and 2024 to evaluate the capabilities and limitations of LLMs in LRPLs and DSLs. We report the LLMs used, benchmarks, and metrics for evaluation, strategies for enhancing performance, and methods for dataset collection and curation. We identified four main evaluation techniques and several metrics for assessing code generation in LRPLs and DSLs. Our analysis categorizes improvement methods into six groups and summarizes novel architectures proposed by researchers. Despite various techniques and metrics, a standard approach and benchmark dataset for evaluating code generation in LRPLs and DSLs are lacking. This survey serves as a resource for researchers and practitioners at the intersection of LLMs, software engineering, and specialized programming languages, laying the groundwork for future advancements in code generation for LRPLs and DSLs.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > Canada > British Columbia > Regional District of Central Okanagan > Kelowna (0.04)
- (14 more...)
- Overview (1.00)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
EB-NeRD: A Large-Scale Dataset for News Recommendation
Kruse, Johannes, Lindskow, Kasper, Kalloori, Saikishore, Polignano, Marco, Pomo, Claudio, Srivastava, Abhishek, Uppal, Anshuk, Andersen, Michael Riis, Frellsen, Jes
Personalized content recommendations have been pivotal to the content experience in digital media from video streaming to social networks. However, several domain specific challenges have held back adoption of recommender systems in news publishing. To address these challenges, we introduce the Ekstra Bladet News Recommendation Dataset (EB-NeRD). The dataset encompasses data from over a million unique users and more than 37 million impression logs from Ekstra Bladet. It also includes a collection of over 125,000 Danish news articles, complete with titles, abstracts, bodies, and metadata, such as categories. EB-NeRD served as the benchmark dataset for the RecSys '24 Challenge, where it was demonstrated how the dataset can be used to address both technical and normative challenges in designing effective and responsible recommender systems for news publishing. The dataset is available at: https://recsys.eb.dk.
- Europe > Denmark > Capital Region > Kongens Lyngby (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- (23 more...)
- Research Report (0.50)
- Overview (0.46)
- Media > News (1.00)
- Information Technology > Services (0.66)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)